Urban noise pollution is a growing concern for city residents, impacting both physical and mental health. In bustling cities like New York, noise generated from traffic is one of the major contributors to environmental pollution. The high density of road networks and heavy traffic, particularly during peak hours, leads to elevated noise levels that affect the quality of life of those living near busy roads.
The purpose of this analysis is to investigate the relationship between traffic volume and noise complaints in New York City during the year 2023. By leveraging geospatial data and statistical analysis, this project aims to determine whether high traffic volumes are directly correlated with increased noise complaints and, if so, to what extent. We will also explore the spatial distribution of traffic and noise complaints to identify potential clusters or patterns across different areas of the city.
The Hypothesis I proposed is: Areas with higher traffic volumes have a higher intensity (density) of noise complaints in New York City during 2023.
The Null Hypothesis is: There is no spatial relationship between traffic volume and the intensity of noise complaints; any observed patterns are due to random chance.
The Alternative Hypothesis is: There is a significant spatial relationship between traffic volume and noise complaints; areas with higher traffic volumes have a higher density of noise complaints.
We will begin with data cleaning to ensure we are working with accurate and relevant information. This includes filtering the noise complaint data for 2023 and cleaning the traffic volume data as needed. Next, we will conduct exploratory data analysis (EDA) to understand the overall trends in traffic and noise complaints. Visualizations and spatial analyses will help us identify areas of concern. Lastly, we will conduct formal statistical analysis, including spatial autocorrelation, to assess the relationship between traffic volume and noise complaints.
# Package Loading
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(stringr)
library(sf)
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(ggplot2)
library(mapview)
library(leaflet)
library(spatstat)
## Loading required package: spatstat.data
## Loading required package: spatstat.univar
## spatstat.univar 3.1-1
## Loading required package: spatstat.geom
## spatstat.geom 3.3-4
## Loading required package: spatstat.random
## spatstat.random 3.3-2
## Loading required package: spatstat.explore
## Loading required package: nlme
##
## Attaching package: 'nlme'
##
## The following object is masked from 'package:dplyr':
##
## collapse
##
## spatstat.explore 3.3-3
## Loading required package: spatstat.model
## Loading required package: rpart
## spatstat.model 3.3-3
## Loading required package: spatstat.linnet
## spatstat.linnet 3.2-3
##
## spatstat 3.3-0
## For an introduction to spatstat, type 'beginner'
library(raster)
## Loading required package: sp
##
## Attaching package: 'raster'
##
## The following object is masked from 'package:nlme':
##
## getData
##
## The following object is masked from 'package:dplyr':
##
## select
library(dplyr)
noise_data <- read_csv("data/Noise_Complaint.csv")
## Rows: 51256 Columns: 40
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (25): Created Date, Closed Date, Agency, Agency Name, Complaint Type, D...
## dbl (6): Unique Key, Incident Zip, X Coordinate (State Plane), Y Coordinat...
## lgl (8): Facility Type, Due Date, Taxi Company Borough, Taxi Pick Up Locat...
## dttm (1): created_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(noise_data)
## # A tibble: 6 × 40
## `Unique Key` `Created Date` `Closed Date` Agency `Agency Name`
## <dbl> <chr> <chr> <chr> <chr>
## 1 56412499 01/01/2023 07:51:18 PM 01/01/2023 08:19:51 … NYPD New York Cit…
## 2 56412502 01/01/2023 06:03:27 PM 01/01/2023 06:13:28 … NYPD New York Cit…
## 3 56412517 01/01/2023 05:29:51 PM 01/01/2023 05:59:09 … NYPD New York Cit…
## 4 56412522 01/01/2023 10:48:24 PM 01/02/2023 12:30:43 … NYPD New York Cit…
## 5 56412584 01/01/2023 05:24:31 PM 01/02/2023 04:22:22 … NYPD New York Cit…
## 6 56412597 01/01/2023 11:53:48 PM 01/02/2023 12:02:56 … NYPD New York Cit…
## # ℹ 35 more variables: `Complaint Type` <chr>, Descriptor <chr>,
## # `Location Type` <chr>, `Incident Zip` <dbl>, `Incident Address` <chr>,
## # `Street Name` <chr>, `Cross Street 1` <chr>, `Cross Street 2` <chr>,
## # `Intersection Street 1` <chr>, `Intersection Street 2` <chr>,
## # `Address Type` <chr>, City <chr>, Landmark <chr>, `Facility Type` <lgl>,
## # Status <chr>, `Due Date` <lgl>, `Resolution Description` <chr>,
## # `Resolution Action Updated Date` <chr>, `Community Board` <chr>, …
glimpse(noise_data)
## Rows: 51,256
## Columns: 40
## $ `Unique Key` <dbl> 56412499, 56412502, 56412517, 5641252…
## $ `Created Date` <chr> "01/01/2023 07:51:18 PM", "01/01/2023…
## $ `Closed Date` <chr> "01/01/2023 08:19:51 PM", "01/01/2023…
## $ Agency <chr> "NYPD", "NYPD", "NYPD", "NYPD", "NYPD…
## $ `Agency Name` <chr> "New York City Police Department", "N…
## $ `Complaint Type` <chr> "Noise - Vehicle", "Noise - Vehicle",…
## $ Descriptor <chr> "Engine Idling", "Car/Truck Horn", "C…
## $ `Location Type` <chr> "Street/Sidewalk", "Street/Sidewalk",…
## $ `Incident Zip` <dbl> 11420, 11218, 11218, 10040, 10453, 10…
## $ `Incident Address` <chr> "124-15 OLD SOUTH ROAD", "CHURCH AVEN…
## $ `Street Name` <chr> "OLD SOUTH ROAD", "CHURCH AVENUE", "C…
## $ `Cross Street 1` <chr> "124 STREET", "CHURCH AVENUE", "CHURC…
## $ `Cross Street 2` <chr> "125 STREET", "EAST 4 STREET", "EA…
## $ `Intersection Street 1` <chr> "124 STREET", "CHURCH AVENUE", "CHURC…
## $ `Intersection Street 2` <chr> "125 STREET", "EAST 4 STREET", "EA…
## $ `Address Type` <chr> "ADDRESS", "INTERSECTION", "INTERSECT…
## $ City <chr> "SOUTH OZONE PARK", NA, NA, "NEW YORK…
## $ Landmark <chr> "OLD SOUTH ROAD", NA, NA, "BOGARDUS P…
## $ `Facility Type` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Status <chr> "Closed", "Closed", "Closed", "Closed…
## $ `Due Date` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Resolution Description` <chr> "The Police Department responded to t…
## $ `Resolution Action Updated Date` <chr> "01/01/2023 08:19:54 PM", "01/01/2023…
## $ `Community Board` <chr> "10 QUEENS", "12 BROOKLYN", "12 BROOK…
## $ Borough <chr> "QUEENS", "BROOKLYN", "BROOKLYN", "MA…
## $ `X Coordinate (State Plane)` <dbl> 1035124, 990750, 990507, 1003671, 101…
## $ `Y Coordinate (State Plane)` <dbl> 181529, 173905, 173771, 252190, 25017…
## $ `Park Facility Name` <chr> "Unspecified", "Unspecified", "Unspec…
## $ `Park Borough` <chr> "QUEENS", "BROOKLYN", "BROOKLYN", "MA…
## $ `Vehicle Type` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Taxi Company Borough` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Taxi Pick Up Location` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Bridge Highway Name` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Bridge Highway Direction` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Road Ramp` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Bridge Highway Segment` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Latitude <dbl> 40.66479, 40.64400, 40.64364, 40.8588…
## $ Longitude <dbl> -73.81662, -73.97658, -73.97745, -73.…
## $ Location <chr> "(40.66478573741414, -73.816622087387…
## $ created_date <dttm> 2023-01-01 19:51:18, 2023-01-01 18:0…
# Load traffic volume data
traffic_data <- read_csv("data/traffic_volume_counts.csv")
## Rows: 1712605 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Boro, WktGeom, street, fromSt, toSt, Direction
## dbl (8): RequestID, Yr, M, D, HH, MM, Vol, SegmentID
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Filter for 2023 data, create datetime, and drop old columns
traffic_data_2023 <- traffic_data %>%
filter(Yr == 2023) %>%
mutate(datetime = make_datetime(Yr, M, D, HH, MM)) %>%
dplyr::select(-Yr, -M, -D, -HH, -MM)
# Create a location description and drop unnecessary columns
traffic_data_2023 <- traffic_data_2023 %>%
mutate(location_desc = str_c(street, " from ", fromSt, " to ", toSt, sep = "")) %>%
dplyr::select(-street, -fromSt, -toSt)
# Aggregate traffic volume by hour and location
traffic_data_aggregated <- traffic_data_2023 %>%
mutate(hour = floor_date(datetime, unit = "hour")) %>%
group_by(location_desc, hour, Boro, Direction, WktGeom) %>%
summarise(total_volume = sum(Vol, na.rm = TRUE), .groups = "drop")
Filtering and Pre-processing Noise Data
vehicle_noise_data_2023 <- noise_data %>%
mutate(
location_desc = str_to_upper(str_trim(`Incident Address`)),
hour = floor_date(created_date, unit = "hour")
) %>%
filter(!is.na(location_desc) & !is.na(hour)) %>%
dplyr::select(
-`Unique Key`, -`Created Date`, -`Closed Date`, -`Agency`, -`Agency Name`,
-`Vehicle Type`, -`Taxi Company Borough`, -`Taxi Pick Up Location`,
-`Bridge Highway Name`, -`Bridge Highway Direction`, -`Road Ramp`, -`Bridge Highway Segment`,
-`Facility Type`, -`Resolution Action Updated Date`, -`Due Date`, -`Resolution Description`,
-`Community Board`, -`X Coordinate (State Plane)`, -`Y Coordinate (State Plane)`, -`Park Facility Name`,
-`Park Borough`, -`Complaint Type`, -`Descriptor`, -`Location Type`, -`Address Type`, -`Status`,
-`Street Name`, -`Cross Street 1`, -`Cross Street 2`, -`Intersection Street 1`, -`Intersection Street 2`,
-`City`, -`Landmark`, -`created_date`, -`Incident Address`
) %>%
rename(location = location_desc) %>%
dplyr::select(location, hour, everything()) %>%
filter(!is.na(Latitude) & !is.na(Longitude))
Since the traffic volume data does not have separate latitude and longitude columns but instead contains route descriptions, and the vehicle noise data does have coordinates, we can follow through with a geospatial join using a proximity-based approach.
Convert Traffic Volume Data to an sf Object and convert Noise Complaint Data to Spatial Data Frame
# Rename and convert traffic data to sf object
traffic_data_aggregated <- traffic_data_aggregated %>%
rename(Borough = Boro)
traffic_data_aggregated_sf <- traffic_data_aggregated %>%
st_as_sf(wkt = "WktGeom", crs = 4326)
# Convert vehicle noise data to sf object
vehicle_noise_data_2023_sf <- vehicle_noise_data_2023 %>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326)
# Spatial join to link noise data with traffic data
vehicle_noise_with_traffic <- st_join(vehicle_noise_data_2023_sf, traffic_data_aggregated_sf, join = st_nearest_feature)
## Warning in st_is_longlat(x): bounding box has potentially an invalid value
## range for longlat data
# Convert the result to a tibble to avoid select issues
merged_data <- as_tibble(vehicle_noise_with_traffic) # Convert to tibble
# Clean up columns and rename for consistency
merged_data <- merged_data %>%
dplyr::select(-Borough.y, -hour.y) %>%
rename(Boro = Borough.x)
str(merged_data)
## tibble [50,448 × 9] (S3: tbl_df/tbl/data.frame)
## $ location : chr [1:50448] "124-15 OLD SOUTH ROAD" "CHURCH AVENUE" "CHURCH AVENUE" "1 BOGARDUS PLACE" ...
## $ hour.x : POSIXct[1:50448], format: "2023-01-01 19:00:00" "2023-01-01 18:00:00" ...
## $ Incident Zip : num [1:50448] 11420 11218 11218 10040 10453 ...
## $ Boro : chr [1:50448] "QUEENS" "BROOKLYN" "BROOKLYN" "MANHATTAN" ...
## $ Location : chr [1:50448] "(40.66478573741414, -73.81662208738771)" "(40.64400329864003, -73.97657773481212)" "(40.64363567249765, -73.9774534914305)" "(40.85885672421463, -73.92979179302147)" ...
## $ geometry :sfc_POINT of length 50448; first list element: 'XY' num [1:2] -73.8 40.7
## $ location_desc: chr [1:50448] "VALENTINE AVENUE from East 178 Street to East Burnside Avenue" "VALENTINE AVENUE from East 178 Street to East Burnside Avenue" "VALENTINE AVENUE from East 178 Street to East Burnside Avenue" "VALENTINE AVENUE from East 178 Street to East Burnside Avenue" ...
## $ Direction : chr [1:50448] "NB" "NB" "NB" "NB" ...
## $ total_volume : num [1:50448] 266 266 266 266 266 266 266 266 266 266 ...
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA
## ..- attr(*, "names")= chr [1:10] "location" "hour.x" "Incident Zip" "Borough.x" ...
summary(merged_data)
## location hour.x Incident Zip
## Length:50448 Min. :2023-01-01 00:00:00.0 Min. :10000
## Class :character 1st Qu.:2023-04-16 19:00:00.0 1st Qu.:10039
## Mode :character Median :2023-06-25 00:00:00.0 Median :10468
## Mean :2023-06-30 12:12:56.9 Mean :10736
## 3rd Qu.:2023-09-14 23:00:00.0 3rd Qu.:11232
## Max. :2023-12-31 23:00:00.0 Max. :11694
## NA's :1
## Boro Location geometry location_desc
## Length:50448 Length:50448 POINT :50448 Length:50448
## Class :character Class :character epsg:4326 : 0 Class :character
## Mode :character Mode :character +proj=long...: 0 Mode :character
##
##
##
##
## Direction total_volume
## Length:50448 Min. :266
## Class :character 1st Qu.:266
## Mode :character Median :266
## Mean :266
## 3rd Qu.:266
## Max. :266
##
Summary Statistics for Key Variables
borough_summary <- merged_data %>%
group_by(Boro) %>%
summarise(total_complaints = n(), .groups = "drop")
print(borough_summary)
## # A tibble: 6 × 2
## Boro total_complaints
## <chr> <int>
## 1 BRONX 11756
## 2 BROOKLYN 12383
## 3 MANHATTAN 13702
## 4 QUEENS 11674
## 5 STATEN ISLAND 931
## 6 Unspecified 2
# Summary of complaints by hour of the day
merged_data <- merged_data %>%
mutate(hour = hour(hour.x))
hourly_summary <- merged_data %>%
group_by(hour) %>%
summarise(total_complaints = n(), .groups = "drop")
print(hourly_summary)
## # A tibble: 24 × 2
## hour total_complaints
## <int> <int>
## 1 0 4050
## 2 1 2634
## 3 2 1698
## 4 3 1259
## 5 4 1195
## 6 5 844
## 7 6 786
## 8 7 789
## 9 8 904
## 10 9 888
## # ℹ 14 more rows
Noise Complaints by Borough
ggplot(merged_data, aes(x = Boro, fill = Boro)) +
geom_bar() +
labs(
title = "Total Number of Noise Complaints by Borough",
x = "Borough",
y = "Number of Complaints"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Noise Complaints by Hour of the Day
# Plot the number of noise complaints by hour of the day
ggplot(merged_data, aes(x = hour)) +
geom_bar(fill = "blue", alpha = 0.6) +
labs(
title = "Noise Complaints by Hour of the Day",
x = "Hour of the Day",
y = "Number of Complaints"
) +
theme_minimal() +
scale_x_continuous(breaks = seq(0, 23, by = 1)) +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.text = element_text(size = 12),
axis.title = element_text(size = 14)
)
Spatial Distribution of Noise Complaints
mapview(vehicle_noise_data_2023_sf, zcol = "Borough")